Skip to content

feat(flexible-outcalls): handle flexible errors#9709

Merged
fspreiss merged 21 commits intomasterfrom
fspreiss/flexible-errors
Apr 13, 2026
Merged

feat(flexible-outcalls): handle flexible errors#9709
fspreiss merged 21 commits intomasterfrom
fspreiss/flexible-errors

Conversation

@fspreiss
Copy link
Copy Markdown
Contributor

@fspreiss fspreiss commented Apr 1, 2026

Summary

Adds error-path support for flexible HTTP outcalls in the consensus payload. Previously, only the success path (FlexibleCanisterHttpResponses) was handled. This PR introduces a new flexible_errors field on CanisterHttpPayload to carry three error types through consensus:

  • Timeout: The request exceeded CANISTER_HTTP_TIMEOUT_INTERVAL. Flexible timeouts are now reported via flexible_errors (instead of the regular timeouts vec) so they can later be encoded as FlexibleHttpRequestResult::Err with a global_error, enabling programmatic detection. For now, flexible timeouts do not carry information, however, the type is designed so that it could carry more information in the future (e.g., it could contain all rejects/error responses).

  • TooManyRequestErrors: More HTTP adapter nodes returned reject responses than the slack allows (committee.len() - min_responses), meaning min_responses OK responses can never be reached. Carries the reject response+proof pairs for validation.

  • ResponsesTooLarge: Even the smallest min_responses OK responses (by content_size) exceed MAX_CANISTER_HTTP_PAYLOAD_SIZE when summed. Carries the metadata shares as proof. The validation logic still has a bug, which will be fixed in CON-1709.

This addresses CON-1691, CON-1692, and CON-1693.

Key changes

  • find_flexible_result replaces find_flexible_responses (utils.rs): A single scan now sorts shares by content_size ascending (favoring smaller responses), collects OK and reject responses, and returns one of OkResponses, Error(TooManyRequestErrors | ResponsesTooLarge), or Pending.

  • Payload building: The main loop integrates find_flexible_result and routes flexible timeouts to flexible_errors instead of timeouts.

  • Payload validation: A new validation block for flexible_errors checks:

    • Timeout: request is actually expired.
    • TooManyRequestErrors: entries are rejects, from unique committee members with valid signatures, and the reject count exceeds the allowed slack.
    • ResponsesTooLarge: shares are from unique committee members with valid signatures, and the smallest min_responses entry sizes truly exceed the payload limit.
  • Validation refactoring (utils.rs): Common per-entry and per-share validation logic is extracted into validate_flexible_response_with_proof and validate_response_share, eliminating duplication between OK-response and error-response validation.

  • CountBytes precision fix (canister_http.rs): CanisterHttpResponseMetadata::count_bytes() now uses precise field-by-field calculation instead of size_of::<Self>(), which was incorrect for heap-allocated fields like CryptoHash and ReplicaVersion.

Not included

into_messages conversion of flexible_errors into Candid-encoded FlexibleHttpRequestResult::Err will be done in a follow-up.

@github-actions github-actions bot added the feat label Apr 1, 2026
@fspreiss fspreiss force-pushed the fspreiss/flexible-errors branch from 4daca11 to aab90a1 Compare April 7, 2026 12:31
@fspreiss fspreiss marked this pull request as ready for review April 8, 2026 07:15
@fspreiss fspreiss requested review from a team as code owners April 8, 2026 07:15
@fspreiss fspreiss requested a review from eichhorl April 8, 2026 07:15
Copy link
Copy Markdown
Contributor

@maksymar maksymar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DSM-team scope LGTM

Comment thread rs/https_outcalls/consensus/src/payload_builder/tests.rs
Comment thread rs/https_outcalls/consensus/src/payload_builder/tests.rs Outdated
Comment thread rs/https_outcalls/consensus/src/payload_builder/tests.rs
Comment thread rs/types/types/src/canister_http.rs
Comment thread rs/https_outcalls/consensus/src/payload_builder/utils.rs
Comment thread rs/https_outcalls/consensus/src/payload_builder/utils.rs Outdated
Comment thread rs/https_outcalls/consensus/src/payload_builder.rs
Comment thread rs/https_outcalls/consensus/src/payload_builder.rs Outdated
Comment thread rs/https_outcalls/consensus/src/payload_builder/utils.rs
Comment thread rs/https_outcalls/consensus/src/payload_builder/utils.rs Outdated
Comment thread rs/https_outcalls/consensus/src/payload_builder.rs
Comment thread rs/https_outcalls/consensus/src/payload_builder/utils.rs Outdated
@fspreiss fspreiss added this pull request to the merge queue Apr 13, 2026
Merged via the queue into master with commit 1e016b7 Apr 13, 2026
38 checks passed
@fspreiss fspreiss deleted the fspreiss/flexible-errors branch April 13, 2026 14:06
daniel-wong-dfinity-org pushed a commit that referenced this pull request Apr 15, 2026
## Summary

Adds error-path support for flexible HTTP outcalls in the consensus
payload. Previously, only the success path
(`FlexibleCanisterHttpResponses`) was handled. This PR introduces a new
`flexible_errors` field on `CanisterHttpPayload` to carry three error
types through consensus:

- **`Timeout`**: The request exceeded `CANISTER_HTTP_TIMEOUT_INTERVAL`.
Flexible timeouts are now reported via `flexible_errors` (instead of the
regular `timeouts` vec) so they can later be encoded as
`FlexibleHttpRequestResult::Err` with a `global_error`, enabling
programmatic detection. For now, flexible timeouts do not carry
information, however, the type is designed so that it could carry more
information in the future (e.g., it could contain all rejects/error
responses).

- **`TooManyRequestErrors`**: More HTTP adapter nodes returned reject
responses than the slack allows (`committee.len() - min_responses`),
meaning `min_responses` OK responses can never be reached. Carries the
reject response+proof pairs for validation.

- **`ResponsesTooLarge`**: Even the smallest `min_responses` OK
responses (by `content_size`) exceed `MAX_CANISTER_HTTP_PAYLOAD_SIZE`
when summed. Carries the metadata shares as proof. The validation logic
still has a bug, which will be fixed in CON-1709.

This addresses CON-1691, CON-1692, and CON-1693.

### Key changes

- **`find_flexible_result` replaces `find_flexible_responses`**
(`utils.rs`): A single scan now sorts shares by `content_size` ascending
(favoring smaller responses), collects OK and reject responses, and
returns one of `OkResponses`, `Error(TooManyRequestErrors |
ResponsesTooLarge)`, or `Pending`.

- **Payload building**: The main loop integrates `find_flexible_result`
and routes flexible timeouts to `flexible_errors` instead of `timeouts`.

- **Payload validation**: A new validation block for `flexible_errors`
checks:
  - Timeout: request is actually expired.
- TooManyRequestErrors: entries are rejects, from unique committee
members with valid signatures, and the reject count exceeds the allowed
slack.
- ResponsesTooLarge: shares are from unique committee members with valid
signatures, and the smallest `min_responses` entry sizes truly exceed
the payload limit.

- **Validation refactoring** (`utils.rs`): Common per-entry and
per-share validation logic is extracted into
`validate_flexible_response_with_proof` and `validate_response_share`,
eliminating duplication between OK-response and error-response
validation.

- **`CountBytes` precision fix** (`canister_http.rs`):
`CanisterHttpResponseMetadata::count_bytes()` now uses precise
field-by-field calculation instead of `size_of::<Self>()`, which was
incorrect for heap-allocated fields like `CryptoHash` and
`ReplicaVersion`.

### Not included

`into_messages` conversion of `flexible_errors` into Candid-encoded
`FlexibleHttpRequestResult::Err` will be done in a follow-up.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants